Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Free, publicly-accessible full text available December 1, 2026
-
Free, publicly-accessible full text available July 4, 2026
-
Ozay, Necmiye; Balzano, Laura; Panagou, Dimitra; Abate, Alessandro (Ed.)The pursuit of robustness has recently been a popular topic in reinforcement learning (RL) research, yet the existing methods generally suffer from computation issues that obstruct their real-world implementation. In this paper, we consider MDPs with low-rank structures, where the transition kernel can be written as a linear product of feature map and factors. We introduce *duple perturbation* robustness, i.e. perturbation on both the feature map and the factors, via a novel characterization of (𝜉,đťś‚) -ambiguity sets featuring computational efficiency. Our novel low-rank robust MDP formulation is compatible with the low-rank function representation view, and therefore, is naturally applicable to practical RL problems with large or even continuous state-action spaces. Meanwhile, it also gives rise to a provably efficient and practical algorithm with theoretical convergence rate guarantee. Lastly, the robustness of our proposed approach is justified by numerical experiments, including classical control tasks with continuous state-action spaces.more » « lessFree, publicly-accessible full text available June 4, 2026
-
Free, publicly-accessible full text available April 25, 2026
-
Li, Yingzhen; Mandt, Stephan; Agrawal, Shipra; Khan, Emtiyaz (Ed.)Off-policy evaluation (OPE) is one of the most fundamental problems in reinforcement learning (RL) to estimate the expected long-term payoff of a given target policy with \emph{only} experiences from another behavior policy that is potentially unknown. The distribution correction estimation (DICE) family of estimators have advanced the state of the art in OPE by breaking the \emph{curse of horizon}. However, the major bottleneck of applying DICE estimators lies in the difficulty of solving the saddle-point optimization involved, especially with neural network implementations. In this paper, we tackle this challenge by establishing a \emph{linear representation} of value function and stationary distribution correction ratio, \emph{i.e.}, primal and dual variables in the DICE framework, using the spectral decomposition of the transition operator. Such primal-dual representation not only bypasses the non-convex non-concave optimization in vanilla DICE, therefore enabling an computational efficient algorithm, but also paves the way for more efficient utilization of historical data. We highlight that our algorithm, \textbf{SpectralDICE}, is the first to leverage the linear representation of primal-dual variables that is both computation and sample efficient, the performance of which is supported by a rigorous theoretical sample complexity guarantee and a thorough empirical evaluation on various benchmarks.more » « lessFree, publicly-accessible full text available May 3, 2026
-
Free, publicly-accessible full text available October 1, 2026
-
Free, publicly-accessible full text available October 16, 2026
An official website of the United States government
